Method for controlling the communication of individual computers in a multicomputer system

ABSTRACT

A method for controlling the communications of single computers in a computer network, in which the single computers are connected with each other via a standard network LAN and a high performance network SAN. Each single computer comprises, in an operation system kernel, a protocol unit connected with the standard network LAN for servicing communication protocols and a library, which is connected in front of the operation system kernel, on which applications are stuck at a communication interface. The selection between the standard network LAN and the high performance network SAN occurs in a network selection unit. It is thereby provided that the network selection happens after the communication interface of the library and before access to the operation system kernel. If the network selection happens before access to the operation system, the library can be connected via a communication path with the high performance network SAN, which bypasses the operation system kernel.

[0001] The invention relates to a method for controlling thecommunication of single computers in a computer network, by which itshould be enabled, to use the network of single computers as efficientparallel computer.

[0002] Single computers or so-called workstation computers, which may bea conventional personal computer (PC) or a workstation, have been highlyimproved in the last years with regard to their computing speed andtherefore computing performance so that a plurality of prograins, bothin the private and the commercial field, are executable with them. Inparticular, in the commercial field, for example for organization ofmedium or big companies, for simulation of application and productionsequences, but also in the field of research and science, the computingperformances of the PCs having presently the highest performance,however, are not adequate to process the pending amount of data in aneconomically acceptable time period. Usually, for such computingintensive tasks, so-called large-scale computers have to be accessed,which are however very cost-intensive.

[0003] Since a long time, it was tried, by setting up a computer networkout of several parallel connected single computers, which are eachrealized by a common PC, to provide a cost-effective alternative tolarge-scale computers. Thereby, the single computers usually comprisestandard processors which have, contrary to special processors, a betterprice-performance-ratio and shorter further development times. The setup or architecture of such a computer network, which is also designatedas parallel computer, is therefore limited to the expansion of theUni-Processor-Architecture with a communication interface, whichrealizes the communication between the single processors, as well as thereplication of the expanded Uni-Processor-Architecture.

[0004] To set up a known computer network, a number of workstationcomputers, so-called (computing-) nodes, a special high performancenetwork SAN (System Area Network), which is operated additionally to astandard network LAN (Local Area Network), as well as an operationsystem for the computing nodes are used. At present, systems with commonprocessors are used as workstation computers or computing nodes in aknown computer network. Apart from single processor systems (UniProcessors), small SMP (Symmetrical Multi Processor) systems (DualProcessors) may also be used as node computers. The expansion of thenode computers (main memory, hard disks, processors, etc.) furtherdepends to a great extent from the user's demand.

[0005] The conventional process for integration of a network within theoperation system of a single computer is depicted in FIG. 1. Maincomponent in an operation system kernel 10 is a first network specificdevice driver 11 for the standard network LAN and a second networkspecific device driver 12 for the high performance network SAN. Bothparallel arranged device drivers 11 and 12 are accessed depending on anin advance connected network selection unit 13 and each adjust thenetwork specific communication interface to the interface expected bythe network protocols. Specific control of the network as well as itsintegration is to a great extent hidden from the user. He further usesthe communication operations provided by a system library 14 and onlyafter servicing the communication protocols in a protocol unit 15 withinthe operation system kernel 10, the actual transition to the differentcommunication networks takes place. Therefore, the conventionalcommunication path occurs starting from an application A or B,eventually by using a programming environment 26 via the communicationinterface 23 of the system library 14, the access to the operationsystem 10, the servicing of the communication protocols in the protocolunit 15, the network selection in the network selection unit 13, thecontrol via the respective device driver 11 or 12 till the access to thehardware in form of a network card 16 or 17 belonging to the selectednetwork.

[0006] In case of corresponding balance between communication—andcomputing performance, large-scale computers have an essentialsignificance, as shown by their increasing distribution. However,success of computer bundles—as cost-effective alternative to large-scalecomputers—is up to now rather moderate and limited to specialapplication classes having low communication effort. One reason for thislies in the low to moderate data transfer performance for availablecommunication networks in the LAN sector, which however was removed bythe arrival of new high performance networks in the SAN sector. Whenusing such high performance networks, however, it is getting very soonobvious that the conventional communication path anchored in theoperation system—as described above—is not able to exhaust also notapproximately, the performance potential of the high performancenetworks. The reason for this lies in the architecture of thecommunication path itself, as well as in the usage of standardizedcommunication protocols (e.g. TCP/IP), which all were not designed forthe needs of parallel processing, but for usage in long-distancenetworks.

[0007] A limited function scope of the networks used in thelong-distance field involves mechanisms for path finding, flow control,fragmenting or defragmenting, sequence maintainance, buffer storage,error detection and error handling, which are all included in thefunctional scope of standardized communication protocols (e.g. TCP/IP).Further, standardized communication protocols often providefunctionalities, which are rather troublesome during usage in parallelsystems. That are especially fixed packet sizes, large-scale check sumcalculations, several protocol layers and a plurality of information inthe packet headers. The unavoidable providing of this information coststime, which is undesired delay time from the program developer'sperspective. To complicate it, a communication path depicted in FIG. 1is not able to adjust itself to the functionality of the underlyingnetworks and always assumes a provided minimum scope. Just this leads,when employing special high performance networks, to implementation ofalready present functionality within the protocol software, whichconsiderably delays its servicing, and decisively hinders theapplication stuck thereon.

[0008] For solving this problem, it is known to use methods for latencytime reduction, which focus on eliminating, as far as possible,inefficiencies on the communication path. Thereby, starting points arenot only at the employed communication networks, but above all at theemployed network protocols, the interaction with the operation system aswell as the definition and the substance of the communication interfaceprovided on the application layer. Reduction of communication latencytimes is based on specific relocation of tasks from higher layers tolower layers of the communication path or communication hardware, whichleads to restructuring of the communication path in its entirety.

[0009] The invention is based on the problem to provide a method forcontrolling a communication of single computers in a computer network,in which the communication latency times are essentially reduced and thedata throughput is increased.

[0010] According to the invention, this problem is solved by a methodaccording to claim 1. Thereby, it is proceeded from the basic idea toonly provide the conventional communication path for the standardnetwork LAN and to provide a second communication path for the highperformance network SAN in parallel thereto, which allows direct accessof an application to the SAN-communication hardware by, at least to agreat extent, bypassing of the operation system so that thecommunication hardware may be controlled out of the address area of theuser. This procedure provides the possibility to completely remove theoperation system, as well as the conventional communication protocols,out of the efficiency-critical path of the communication operations. Theapplications on the user side are at least stuck on a library, in whichor immediately after which, a network selection unit selects one of thetwo networks. Thereby, the network selection occurs before servicing ofthe protocol, which takes place within the operation system By thisrelocation of the selection of the network, which only occurs in aconventional architecture of the communication path between theservicing of the protocol and the device drivers, it is possible, toearly reroute the communication connection to the faster additionalcommunication path, i.e. before or immediately after access to theoperation system kernel and above all, before servicing of thecommunication protocols. However, this rerouting only takes place, ifthe desired communication connection may also be handled via the highperformance network and therefore the additional communication path. Ifthis is not the case, the conventional communication path is used by theoperation system. It turned out that in this way, efficient parallelprocessing in a network of coupled workstation computers havinghigh-performance and flexibility may be reached.

[0011] Further details and features of the invention can be seen fromthe following description with reference to the drawings. It is shown:

[0012]FIG. 1 a schematic diagram of a conventional communicationarchitecture,

[0013]FIG. 2 a schematic diagram of an inventive communicationarchitecture according to a first embodiment,

[0014]FIG. 3 a schematic comparison of the communication architecturesaccording to FIGS. 1 and 2 with explanation of the amendments accordingto the method,

[0015]FIG. 4 a schematic diagram of an inventive communicationarchitecture according to a second embodiment,

[0016]FIG. 5 a schematic comparison of the communication architecturesaccording to FIGS. 1 and 4 with explanation of the amendments accordingto the method,

[0017]FIG. 6 a schematic diagram of an inventive communicationarchitecture according to a third embodiment and

[0018]FIG. 7 a schematic comparison of the communication architecturesaccording to FIGS. 1 and 6 with explanation of the amendments accordingto the method.

[0019] First of all, in the following, it is dealt with the basecomponents of a computer network, which uses the inventive method.Conventional PCs are used as single place or node computers, as alreadydescribed above. Apart from the standard network LAN, thehigh-performance network SAN is operated, which should comprise an ashigh as possible transmission capacity of exemplary 1.28 Gbit/s, amultidimensional, free selectable, scaleable network topology as well asa free programmability of the network adapter. Such a high performancenetwork is known as such.

[0020] Unix or a derivate is used as operation system of the nodecomputers within the computer network. Further, a system software isnecessary to design the computer network out of the common singlecomponents. Essentially, the system software comprises the followingcomponents:

[0021] a program for controlling the network adapters,

[0022] a device driver for integration of the network adapter into theoperation system,

[0023] a base library for controlling and processing the communicationconnection,

[0024] user libraries for standardized programming-interfaces and-environments,

[0025] a program for set-up, for managing and for controlling thecomputer network, as well as

[0026] service programs for configuration and administration of thecomputer network.

[0027]FIG. 2 shows the schematic representation of a communicationarchitecture according to the invention, whereby functions alreadyexplained in connection with FIG. 1 are provided with the same referencenumbers. As shown in FIG. 2, three exemplary depicted applications A, B,and C access, eventually by insertion of a programming environment 25 or26, via communication interfaces 23 or 25, a base library 18, in which anetwork selection unit 13 is integrated. The network selection unit 13may either access the protocol unit 15 within the operation systemkernel 10, after which the first device driver 13 for the hardware ornetwork card 16 of the standard network LAN is connected. Further, theknown system library 14 is assigned to the protocol unit 15.

[0028] Alternatively, the network selecting unit 13 may also access asecond communication path 19, which directly connects the base library18, under bypassing of the operation system kernel 10, with the hardwareor network card 17 of the high performance network SAN. A device driver12, which however only executes managing tasks and is not anymoreintegrated into the actual communication, is as well assigned to thesecond communication path 19. On the basis of the network selection inor immediately after the base library 18, i.e. before access to theoperation system kernel 10, communication connections may already bererouted at an earlier state to the faster, second communication path19, and in that way may, under bypassing of the operation system kernel10, be directly supplied to the high performance network SAN. If acommunication connection via the second communication path 19 is notpossible, because for example the SAN environment is temporarily notavailable or the target is only reachable via the LAN environment, it isfallen back to the first communication path, i.e. the operation systemcommunication and the standard network LAN.

[0029] The used high performance network SAN should be optimized for thedemands of parallel processing. By this, functionality which is usuallyrealized as software, is shifted to the responsibility of the networkhardware. Especially, this applies for

[0030] a) the scalability of the network, which avoids a performancebreak-in during the increase of connected node computers,

[0031] b) the path finding within the network, whereby the protocols onhigher layers are intensively simplified,

[0032] c) the loss-free data communication and the sequence maintainanceof subsequent packages, whereby flow control mechanisms on higher layersare intensively simplified,

[0033] d) variable packet sizes, which avoid a waste of bandwidth, aswell as

[0034] e) minimal communication protocols, which manage without muchinformation and reduce the effort for creating the packages.

[0035] For providing slim communication protocols, all protocol tasks,which can be relocated directly into network hardware, are relocatedthere. For example, this is secured data transmission by flow control aswell as the sequence-loyalty of packet streams. Further, the availablecontext information is used, to design communication protocols as slimas possible. Especially the fact that a network of parallel computers isa closed network with a known number of nodes and known topology, forexample simplifies the path finding and path selecting problematicnature, since all possible paths may be statistically calculated inadvance, as well as the identifications of nodes, since all nodes areknown in advance and may be provided with a clear identification.Besides, the used protocol is not subjected to compatibility restrictionon the basis of communication relations with foreign systems, since noforeign systems exist within the network of parallel designed computers.Altogether, the consequent exploitation of the present system knowledgeleads to complete elimination of the protocols for the secondcommunication path 19, which are usually anchored in the operationsystem kernel.

[0036] The multiprocessor-ability, i.e. the ability that severalprocesses may provide communication connections at the same time, whichis necessary for the functionality of the computer network, is achievedin the inventive system architecture by corresponding mechanisms withinthe base library.

[0037] From the user's perspective, the existence of standardizedcommunication interfaces is of essential significance, since they allowhim, to transfer, without great effort, a plurality of applications onthe respective target system. Further, standardized communicationinterfaces guarantee that applications do not again specially have to beadjusted during change to a new computer generation. For this reason, aprogramming interface 23 being syntactically and semantically equivalentto an interface of the first communication path or the operation systemcommunication, is according to the invention provided to the user. Basedon that, an application A or B can, eventually by using a standardizedprogramming environment 26, service a communication via the base library18. Further, specially adjusted versions of standardized (MPI=MessagePassing Interface) or widely distributed programming environments 25(PVM=Parallel Virtual Machine) are offered, which interact with the baselibrary 18 via a special interface 24. The programming interface 23 isprimarily suitable for applications from distributed data processing inlocal networks and its transfer to the inventive system. However, theprogramming environments 25 or 26 PVM and MPI represent the connector tocommercial parallel computers and the applications running on them.

[0038] In FIG. 3, on the left side, the conventional communicationarchitecture, as already described by FIG. 1, is directly compared withthe inventive communication architecture shown on the right side, as itwas described by FIG. 2, whereby arrows going between both diagramsindicate the relocation of single communication-process steps.

[0039] Arrow (1) in FIG. 3 describes the relocation of the access to theSAN-network from the lower layers of the operation system directly intothe base library 18. In this way, the communication architecture isreleased from all restrictions, which are usually present within anoperation system. By this, the ability of the memory managementcomponent being present in the system, to construct logic address fieldsout of physical memory areas at will, is used. The so-called baseprinciple is used for the communication hardware and can be designatedas user-level-communication.

[0040] As indicated by arrow (2) in FIG. 3, the produced functionalityin the protocols, which was up to now located in the operation systemkernel 10, especially the secure data transmission via flow control aswell as sequence loyalty of packet streams, is relocated either directlyto the SAN-network hardware 17 or to the base library so that theprotocols for the second communication path 19, which were up to nowlocated in the operation system kernel 10, may be eliminated. If aprogrammable network adapter exists for the network hardware 17, it ispossible to have the desired functionality exclusively produced by thenetwork. According to the invention, the selection of the network, whichis usually located in the operation system kernel between protocolservicing and the device driver, is relocated out of the operationsystem into the base library 18 (see arrow (3)). Therefore, it ispossible to reroute communication connections to the faster secondcommunication path 19, before passing through the operation systemkernel.

[0041] The mapping of the operation system functionality from theoperation system into the base library according to arrow (4) isrealized by the multiprocessability of the base library. The proceduresfor protecting critical program segments and data areas by means ofsemaphores, which are necessary for this, are as such known fromoperation system design.

[0042] The programming interface 23 of an application is also mappedfrom the system library to the base library (see arrow (5 b)) andfurther provides equivalent programming environments 25 (see arrow (5a)), which, as far as they are concerned, directly access the baselibrary 18 via the interface 24. Both measures serve, to be able toeasier, better and faster transfer applications to the inventivecommunication architecture.

[0043] The up to now illustrated communication architecture provides, incontrast to a conventional communication architecture in operationsystems, considerable performance advantages, however, also has ratherdisadvantageous side effects. On the one side, the performance isachieved by a limitation with regard to the security of thecommunication interface, and on the other side, standard applications,which want to use high speed communication, have to be connected with aspecial system library. To remove both critical points, thecommunication architecture depicted in FIG. 4 is suggested. In contrastto FIG. 3, the renewed relocation of the network selection 13 from thelibrary 18 into the operation system kernel 10, however, before theactual access to the protocol processing, guarantees the usual securityof the communication interfaces in operation systems, as well as thedesired transparency of the communication interface in contrast to theuser applications, which now manage without special connection to thebase library 18.

[0044]FIG. 4 shows in detail a further-developed embodiment of acommunication architecture. Applications A and B access, if necessary byinserting a programming environment 26, the system library 14, which isconnected after the operation system kernel 10. Directly after access tothe operation system 10, the selection between the standard network LANand the high performance network SAN is made in a network selection unit13. When the standard network LAN is selected, the communicationprotocols are serviced in the protocol unit 15, after which the devicedriver 11 for the LAN-network hardware 16 is connected. When the highperformance network SAN is selected, the SAN-network hardware 17 can bedirectly accessed via a communication path 19. Here, a communicationpath 19 also includes a protocol layer 21 and a device driver 12, whichhowever only serves management tasks and is not integrated into theactual communication.

[0045] In addition to the communication path 19, which connects thesystem library 14 after network selection immediately after access tothe operation system kernel 10 with the SAN-network hardware 17, outsidethe operation system kernel, a base library 18 is provided, which isaccessed by an application C by inserting of a suitable programmingenvironment 25 and which directly accesses the SAN-network hardware 17via the communication path 19′, which is located outside the operationsystem kernel. In this way, so-called unprivileged communication endsare provided, which allow access to the SAN-network hardware 17 underbypassing of the operation system, but which in contrary to pureuser-level communication are subjected to all protection mechanism ofthe operation system. A very efficient control of the SAN-networkhardware 17, however, without bypassing the protection mechanisms of theoperation system, results from this. Communication ends are in itselfclosed and protected units managed and protected by the operationsystem, which are each exclusively assigned to an application so thatdifferent applications use different communication ends, and forexample, application A is not able to access an end of an application B,although both communication ends are serviced via the same hardware.

[0046] Also in the communication architecture shown in FIG. 4, therequirements for the high performance network SAN and the slimcommunication protocols described above in connection with FIG. 2 shouldbe realized. Moreover, standardized programming interfaces andstandardized or widely used programming environments are also in thiscase provided.

[0047] In FIG. 5, the relocation of functionality and access points incomparison between a conventional communication architecture, asdescribed in connection with FIG. 1, and the new communicationarchitecture according to FIG. 4, is shown, whereby arrows going betweenboth diagrams also symbolize the single relocation. The relocation ofthe access to the SAN-network from lower layers of the operation systeminto the protocol layer 21 of the communication path 19 within theoperation system kernel and/or directly into the address area of anapplication or into the base library 18 as part of the application,which is indicated by arrows (1 a) and (1 b), releases the system fromall restrictions, which are usually present within the operation system.Thereby, the ability of the memory management component is used, toconstruct out of physical memory areas logical address areas at will. Incombination with additional functionality within the network adapter,unprivileged communication ends result therefrom.

[0048] A large part of the functionality produced in common protocols isrelocated according to arrow (2) directly into the SAN-network hardware,as well as into the protocol layer 21 of communication path 19. Thethings said for the relocation of the protocol functionality inconnection with FIG. 2 correspondingly apply here.

[0049] According to arrow (3), the selection of the network, whichhappens in conventional communication architectures between protocolservicing and device driver, is relocated in advance to the actualprotocol servicing and in the shown example, immediately after access tothe operation system kernel 10 so that communication operations may beearly rerouted to the faster communication path 19. Here, however, thisrerouting only happens, if the desired communication connection may behandled via the communication path 19. If this is not the case, it isfallen back to the conventional operation system communication.Relocation of functionality out of the operation system into theSAN-network hardware according to arrow (4) is realized by themultiprocessability of the offered communication interface in form ofthe independent communication ends. The procedure for the protection ofmemory components, which is necessary for it, is known as such and isperformed on the hardware side by the memory management component of thecomputer.

[0050] If applications A and B use the regular communication interfaceof the operation system, i.e. the system library, integration of highperformance communication is completely transparent, based on placingthe network selection for this application. However, if known parallelcomputer-programming environments (PVM or MPI) are employed, furtheroptimization within the communication path are possible. These aresupported by offering equivalent or optimized programming environmentsso that the application interface is relocated to this programmingenvironment, as it is indicated by arrow (5) in FIG. 5.

[0051] In the communication architecture according to FIG. 4, networkselection happens directly after access to the operation system. Forthis, usually, modifications of the operation system kernel arenecessary. If the operation system does not allow to carry outmodifications at this position, an alternative communicationarchitecture, as shown in FIG. 6, may be used. Essentially, thisarchitecture differs from the architecture according to FIG. 4, in thatthe network selection 13 is relocated from the operation system kernelto an in advance located PS-system library 22. This PS-system library 22essentially unifies the functionality of the conventional system libraryand the base library and offers the user to the outside the sameinterfaces as the system library. If an application uses the PS-systemlibrary 22 instead of the regular system library 14, which still exists,all internal communication connections are handled—as far aspossible—via the SAN-high performance network.

[0052] The following amendments additionally result to the relocationsof functionality and access points, which were already explained inconnection with FIG. 5. The selection of the network (arrow (3)) is nowrelocated to the PS-system library 22, which thereby provides allfunctions of the actual system library. The relocation Of the networkselection to the PS-system library enables to early reroutecommunication connections, i.e. before access to the operation systemkernel and above all before servicing of the standard protocols, to thefaster communication path 19 of the SAN-high performance network.However, also in this connection, this relocation only takes place ifthe desired communication connection may also be serviced via thiscommunication path 19. Otherwise, it is fallen back to conventionaloperation system communication.

[0053] As already explained in connection with FIG. 5, the programminginterface of an application is relocated from the system library to thebase library (arrow (5 b)) and equivalent programming environments areprovided (arrow (5 a)), which, as far as they are concerned, directlyaccess the base library.

1. Method for controlling the communication of single computers in acomputer network, whereby the single computers are connected with eachother via a standard network LAN and a high-performance network SAN andwhereby each single computer comprises in an operation system kernel(10) a protocol unit (15) connected with said standard network LAN forservicing communication protocols and a library (14; 14, 18; 14, 18, 22)connected in front of said operating system kernel (10), on whichapplications (A, B) are stuck at a communication interface (23, 24),whereby the selection between said standard network LAN and saidhigh-performance network SAN happens in a network selection unit (13),characterized in that, said network selection happens after thecommunication interface (23, 24) of said library and before orimmediately after access to said operation system kernel (10).
 2. Methodaccording to claim 1, characterized in that, said network selectionhappens before said access to said operation system kernel (10) and saidlibrary is connected with said high-performance network SAN via acommunication path (19), which bypasses said operation system kernel(10).
 3. Method according to claim 1, characterized in that, saidnetwork selection happens after access to the operation system kernel(10) and said library is connected with said high-performance networkSAN via a communication path (19) and that a further communication path(19′) is provided, which directly connects a further library with saidhigh-performance network SAN.
 4. Method according to claim 3,characterized in that, said further communication path (19′) is locatedoutside said operation system kernel (10).
 5. Method according to any ofclaims 1 to 4, characterized in that, said network selection takes placedepending on target addresses given by said application.
 6. Methodaccording to any of claims 1 to 5, characterized in that, said standardnetwork LAN is accessed, if a communication connection to saidhigh-performance network LAN via said communication path (19) or saidfurther communication path (19′) is not possible.
 7. Method according toany of claims 1 to 6, characterized in that, protocol tasks arerelocated to the network hardware (17) for providing slimmercommunication protocols.